Conduct an analysis of the contributions made by uniformed personnel from "Kenya" to UN Missions, with a focus on gender breakdown. Download the historical contributions dataset from the Peace & Security Data Hub, and perform a detailed examination to uncover trends, including troop numbers, mission types, duration of engagements, and temporal changes. Output the results by creating comprehensive data visualizations using Excel.
Import the dataframe:
Presentation and Description:
dataframe.head(), dataframe.shape, and dataframe.columns.Handling Duplicates and Null Values:
dataframe.duplicated() and dataframe.drop_duplicates().dataframe.isnull().sum() and replace NaN values in specific columns.Managing Dates and Data Types:
dataframe.dtypes and convert columns with astype() for better analysis.Derived Variables:
Sampling for Focused Analysis:
dataframe_kenya, for a targeted analysis on contributions and personnel.dataframe_kenya to preview the data.Data Encoding for Enhanced Analysis:
mission_acronym and personnel_type into a numerical format.Active Engagement:
Yearly Impact:
Gender Parity:
Troop Contribution:
Conclusion:
Annex:
I have developed a Power BI application to enhance accessibility and visualization. The Power BI application provides a dynamic and interactive exploration of the analysis, offering a user-friendly interface. Access the project here.
The map and graphics were created using Python, leveraging Pandas and Plotly libraries for visualization. Access the notebook on GitHub here.
Introduction:
Active Engagement:
Gender Parity:
Other Information:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
url = "https://api.psdata.un.org/public/data/DPO-UCHISTORICAL/csv"
dataframe = pd.read_csv(url)
dataframe.head(20)
| id | contribution_id | last_reporting_date | isocode3 | mission_acronym | personnel_type | female_personnel | male_personnel | m49_code | contributing_country | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 434570 | 2010-01-31 | TUN | MONUC | Experts on Mission | 0.0 | 31.0 | 788 | Tunisia |
| 1 | 2 | 434569 | 2010-01-31 | TUN | MINURCAT | Troops | 0.0 | 3.0 | 788 | Tunisia |
| 2 | 3 | 434568 | 2010-01-31 | TUN | MINURCAT | Experts on Mission | 0.0 | 4.0 | 788 | Tunisia |
| 3 | 4 | 434567 | 2010-01-31 | TGO | UNOCI | Troops | 0.0 | 314.0 | 768 | Togo |
| 4 | 5 | 434566 | 2010-01-31 | TGO | UNOCI | Experts on Mission | 0.0 | 7.0 | 768 | Togo |
| 5 | 6 | 434565 | 2010-01-31 | TGO | UNOCI | Individual Police | 0.0 | 16.0 | 768 | Togo |
| 6 | 7 | 434564 | 2010-01-31 | TGO | UNMIL | Troops | 0.0 | 1.0 | 768 | Togo |
| 7 | 8 | 434563 | 2010-01-31 | TGO | UNMIL | Experts on Mission | 0.0 | 2.0 | 768 | Togo |
| 8 | 9 | 434562 | 2010-01-31 | TGO | UNAMID | Experts on Mission | 0.0 | 8.0 | 768 | Togo |
| 9 | 10 | 434561 | 2010-01-31 | TGO | UNAMID | Individual Police | 0.0 | 1.0 | 768 | Togo |
| 10 | 11 | 434560 | 2010-01-31 | TGO | MONUC | Individual Police | 0.0 | 18.0 | 768 | Togo |
| 11 | 12 | 434559 | 2010-01-31 | TGO | MINUSTAH | Individual Police | 0.0 | 4.0 | 768 | Togo |
| 12 | 13 | 434558 | 2010-01-31 | TGO | MINURCAT | Troops | 0.0 | 457.0 | 768 | Togo |
| 13 | 14 | 434557 | 2010-01-31 | TGO | MINURCAT | Individual Police | 0.0 | 8.0 | 768 | Togo |
| 14 | 15 | 434556 | 2010-01-31 | THA | UNMIT | Individual Police | 5.0 | 13.0 | 764 | Thailand |
| 15 | 16 | 434555 | 2010-01-31 | THA | UNMIS | Experts on Mission | 1.0 | 9.0 | 764 | Thailand |
| 16 | 17 | 434554 | 2010-01-31 | THA | UNAMID | Troops | 2.0 | 8.0 | 764 | Thailand |
| 17 | 18 | 434553 | 2010-01-31 | THA | UNAMID | Experts on Mission | 0.0 | 9.0 | 764 | Thailand |
| 18 | 19 | 434552 | 2010-01-31 | TZA | UNOCI | Troops | 3.0 | 0.0 | 834 | United Republic of Tanzania |
| 19 | 20 | 434551 | 2010-01-31 | TZA | UNMIS | Experts on Mission | 0.0 | 12.0 | 834 | United Republic of Tanzania |
print(dataframe.shape)
print(dataframe.columns)
(146967, 10)
Index(['id', 'contribution_id', 'last_reporting_date', 'isocode3',
'mission_acronym', 'personnel_type', 'female_personnel',
'male_personnel', 'm49_code', 'contributing_country'],
dtype='object')
The DataFrame has a shape of (146967, 10), meaning it contains 146,967 rows and 10 columns.
The columns in your DataFrame are as follows:
id: An identifier for each row.contribution_id: An identifier for contributions.last_reporting_date: Date of the last reporting.isocode3: A three-letter country code.mission_acronym: Acronym for the mission.personnel_type: Type of personnel involved.female_personnel: Count of female personnel.male_personnel: Count of male personnel.m49_code: A numerical country code.contributing_country: The country contributing personnel.duplicates = dataframe[dataframe.duplicated(subset='contribution_id',keep=False)]
if not duplicates.empty :
print("There are duplicates based on the contribution_id column:")
print(duplicates)
else :
print("No duplicates found based on contribution_id column.")
No duplicates found based on contribution_id column.
dataframe.isnull().sum()
id 0 contribution_id 0 last_reporting_date 0 isocode3 0 mission_acronym 0 personnel_type 0 female_personnel 14 male_personnel 22 m49_code 0 contributing_country 0 dtype: int64
We have identified two columns with missing values: female_personnel has 14 occurrences, and male_personnel has 22 occurrences. To address this, we are going to replace these missing values with the numeric value "0".
dataframe["female_personnel"].fillna(0,inplace=True)
dataframe["male_personnel"].fillna(0,inplace=True)
dataframe.isnull().sum()
id 0 contribution_id 0 last_reporting_date 0 isocode3 0 mission_acronym 0 personnel_type 0 female_personnel 0 male_personnel 0 m49_code 0 contributing_country 0 dtype: int64
From the last reporting_date column, we will add the year column to our dataframe. To do this, we will retrieve the first 4 characters of the last_reporting_date values corresponding to the year.
dataframe["year"]=dataframe["last_reporting_date"].str[:4]
dataframe['id'] = dataframe['id'].astype(int)
dataframe['contribution_id'] = dataframe['contribution_id'].astype(int)
dataframe['last_reporting_date'] = dataframe['last_reporting_date'].astype(str)
dataframe['mission_acronym'] = dataframe['mission_acronym'].astype(str)
dataframe['personnel_type'] = dataframe['personnel_type'].astype(str)
dataframe['female_personnel'] = dataframe['female_personnel'].astype(int)
dataframe['male_personnel'] = dataframe['male_personnel'].astype(int)
dataframe['m49_code'] = dataframe['m49_code'].astype(int)
dataframe['contributing_country'] = dataframe['contributing_country'].astype(str)
dataframe['year'] = dataframe['year'].astype(int)
print(dataframe.dtypes)
id int32 contribution_id int32 last_reporting_date object isocode3 object mission_acronym object personnel_type object female_personnel int32 male_personnel int32 m49_code int32 contributing_country object year int32 dtype: object
total_personnel:
female_percent:
male_percent:
dataframe["total_personnel"] = dataframe["female_personnel"] + dataframe["male_personnel"]
dataframe["female_percent"] = round(dataframe["female_personnel"] * 100 / dataframe["total_personnel"],2)
dataframe["male_percent"] = round(dataframe["male_personnel"] * 100 / dataframe["total_personnel"],2)
These derived variables provide additional insights into the gender distribution within the personnel, offering a more comprehensive view of the data.
Filtering Data:
dataframe['contributing_country'] == "Kenya" to create a new dataframe, dataframe_kenya, containing only entries from Kenya.Analysis Focus:
Displaying Sample:
dataframe_kenya to provide a preview for analysis.This targeted approach enables a more efficient and specific analysis of Kenya's impact and characteristics in the dataset.
dataframe_kenya =dataframe[(dataframe['contributing_country'] == "Kenya")]
dataframe_kenya.head(20)
| id | contribution_id | last_reporting_date | isocode3 | mission_acronym | personnel_type | female_personnel | male_personnel | m49_code | contributing_country | year | total_personnel | female_percent | male_percent | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 824 | 825 | 434193 | 2010-01-31 | KEN | UNMIS | Troops | 44 | 681 | 404 | Kenya | 2010 | 725 | 6.07 | 93.93 |
| 825 | 826 | 434192 | 2010-01-31 | KEN | UNMIS | Experts on Mission | 1 | 3 | 404 | Kenya | 2010 | 4 | 25.00 | 75.00 |
| 826 | 827 | 434191 | 2010-01-31 | KEN | UNMIS | Individual Police | 1 | 18 | 404 | Kenya | 2010 | 19 | 5.26 | 94.74 |
| 827 | 828 | 434190 | 2010-01-31 | KEN | UNMIL | Individual Police | 5 | 16 | 404 | Kenya | 2010 | 21 | 23.81 | 76.19 |
| 828 | 829 | 434189 | 2010-01-31 | KEN | UNAMID | Troops | 6 | 74 | 404 | Kenya | 2010 | 80 | 7.50 | 92.50 |
| 829 | 830 | 434188 | 2010-01-31 | KEN | UNAMID | Experts on Mission | 1 | 5 | 404 | Kenya | 2010 | 6 | 16.67 | 83.33 |
| 830 | 831 | 434187 | 2010-01-31 | KEN | MONUC | Experts on Mission | 1 | 23 | 404 | Kenya | 2010 | 24 | 4.17 | 95.83 |
| 831 | 832 | 434186 | 2010-01-31 | KEN | MINURCAT | Troops | 0 | 4 | 404 | Kenya | 2010 | 4 | 0.00 | 100.00 |
| 1575 | 1576 | 435091 | 2010-02-28 | KEN | UNMIS | Troops | 44 | 681 | 404 | Kenya | 2010 | 725 | 6.07 | 93.93 |
| 1576 | 1577 | 435090 | 2010-02-28 | KEN | UNMIS | Experts on Mission | 1 | 3 | 404 | Kenya | 2010 | 4 | 25.00 | 75.00 |
| 1577 | 1578 | 435089 | 2010-02-28 | KEN | UNMIS | Individual Police | 1 | 18 | 404 | Kenya | 2010 | 19 | 5.26 | 94.74 |
| 1578 | 1579 | 435088 | 2010-02-28 | KEN | UNMIL | Individual Police | 5 | 16 | 404 | Kenya | 2010 | 21 | 23.81 | 76.19 |
| 1579 | 1580 | 435087 | 2010-02-28 | KEN | UNAMID | Troops | 5 | 76 | 404 | Kenya | 2010 | 81 | 6.17 | 93.83 |
| 1580 | 1581 | 435086 | 2010-02-28 | KEN | UNAMID | Experts on Mission | 1 | 5 | 404 | Kenya | 2010 | 6 | 16.67 | 83.33 |
| 1581 | 1582 | 435085 | 2010-02-28 | KEN | MONUC | Experts on Mission | 1 | 23 | 404 | Kenya | 2010 | 24 | 4.17 | 95.83 |
| 1582 | 1583 | 435084 | 2010-02-28 | KEN | MINURCAT | Troops | 0 | 4 | 404 | Kenya | 2010 | 4 | 0.00 | 100.00 |
| 2464 | 2465 | 435984 | 2010-03-31 | KEN | UNMIS | Troops | 44 | 681 | 404 | Kenya | 2010 | 725 | 6.07 | 93.93 |
| 2465 | 2466 | 435983 | 2010-03-31 | KEN | UNMIS | Experts on Mission | 1 | 3 | 404 | Kenya | 2010 | 4 | 25.00 | 75.00 |
| 2466 | 2467 | 435982 | 2010-03-31 | KEN | UNMIS | Individual Police | 1 | 18 | 404 | Kenya | 2010 | 19 | 5.26 | 94.74 |
| 2467 | 2468 | 435981 | 2010-03-31 | KEN | UNMIL | Individual Police | 5 | 16 | 404 | Kenya | 2010 | 21 | 23.81 | 76.19 |
The process of encoding data involves transforming categorical variables, such as 'mission_acronym' and 'personnel_type,' into a numerical format that can be utilized in statistical analyses. In this case, we employ one-hot encoding to convert these categorical columns into binary indicators, facilitating the exploration of relationships and patterns in the subsequent correlation map. The resulting encoded DataFrame, with new binary columns representing each category, ensures a more comprehensive and effective analysis of the dataset's underlying structure and associations.
columns_to_encode = ['mission_acronym', 'personnel_type']
encoded_df = pd.get_dummies(dataframe_kenya, columns=columns_to_encode)
encoded_df[encoded_df.columns.difference(dataframe_kenya.columns)] = encoded_df[encoded_df.columns.difference(dataframe_kenya.columns)].astype(int)
encoded_df.columns
Index(['id', 'contribution_id', 'last_reporting_date', 'isocode3',
'female_personnel', 'male_personnel', 'm49_code',
'contributing_country', 'year', 'total_personnel', 'female_percent',
'male_percent', 'mission_acronym_MINURCAT', 'mission_acronym_MINURSO',
'mission_acronym_MINUSCA', 'mission_acronym_MINUSMA',
'mission_acronym_MONUC', 'mission_acronym_MONUSCO',
'mission_acronym_UNAMID', 'mission_acronym_UNIFIL',
'mission_acronym_UNISFA', 'mission_acronym_UNMHA',
'mission_acronym_UNMIL', 'mission_acronym_UNMIS',
'mission_acronym_UNMISS', 'mission_acronym_UNSMIS',
'mission_acronym_UNSOM', 'mission_acronym_UNSOS',
'personnel_type_Experts on Mission', 'personnel_type_Individual Police',
'personnel_type_Staff Officer', 'personnel_type_Troops'],
dtype='object')
kenya_about_world = pd.DataFrame()
kenya_about_world['number_of_contributions'] = dataframe.groupby('contributing_country')['contributing_country'].count()
kenya_about_world['sum_female_personnel'] = dataframe.groupby('contributing_country')['female_personnel'].sum()
kenya_about_world['sum_male_personnel'] = dataframe.groupby('contributing_country')['male_personnel'].sum()
kenya_about_world['total_personnel'] = kenya_about_world['sum_female_personnel']+kenya_about_world['sum_male_personnel']
kenya_about_world['percent_female_personnel'] = round(kenya_about_world['sum_female_personnel'] * 100 / kenya_about_world['total_personnel'],2)
kenya_about_world['percent_male_personnel'] = round(kenya_about_world['sum_male_personnel'] * 100 / kenya_about_world['total_personnel'],2)
kenya_about_world["percent_female_personnel"].fillna(0,inplace=True)
kenya_about_world["percent_male_personnel"].fillna(0,inplace=True)
kenya_about_world.iloc[70:91]
| number_of_contributions | sum_female_personnel | sum_male_personnel | total_personnel | percent_female_personnel | percent_male_personnel | |
|---|---|---|---|---|---|---|
| contributing_country | ||||||
| Ireland | 1236 | 4020 | 64550 | 68570 | 5.86 | 94.14 |
| Israel | 3 | 0 | 42 | 42 | 0.00 | 100.00 |
| Italy | 1026 | 8632 | 178767 | 187399 | 4.61 | 95.39 |
| Jamaica | 191 | 448 | 673 | 1121 | 39.96 | 60.04 |
| Japan | 258 | 320 | 24226 | 24546 | 1.30 | 98.70 |
| Jordan | 3290 | 3036 | 278666 | 281702 | 1.08 | 98.92 |
| Kazakhstan | 295 | 75 | 4779 | 4854 | 1.55 | 98.45 |
| Kenya | 1939 | 13032 | 79910 | 92942 | 14.02 | 85.98 |
| Kiribati | 13 | 24 | 36 | 60 | 40.00 | 60.00 |
| Kyrgyzstan | 872 | 191 | 2515 | 2706 | 7.06 | 92.94 |
| Latvia | 117 | 21 | 260 | 281 | 7.47 | 92.53 |
| Lesotho | 89 | 0 | 140 | 140 | 0.00 | 100.00 |
| Liberia | 373 | 1264 | 11113 | 12377 | 10.21 | 89.79 |
| Libya | 2 | 1 | 4 | 5 | 20.00 | 80.00 |
| Lithuania | 376 | 286 | 2799 | 3085 | 9.27 | 90.73 |
| Luxembourg | 69 | 0 | 140 | 140 | 0.00 | 100.00 |
| Madagascar | 568 | 601 | 4230 | 4831 | 12.44 | 87.56 |
| Malawi | 1255 | 9195 | 111931 | 121126 | 7.59 | 92.41 |
| Malaysia | 1491 | 5970 | 139254 | 145224 | 4.11 | 95.89 |
| Mali | 869 | 771 | 10007 | 10778 | 7.15 | 92.85 |
| Malta | 61 | 18 | 546 | 564 | 3.19 | 96.81 |
It is crucial to compare Kenya's commitment to UN peace missions with its neighboring countries, as this comparative analysis provides valuable insights into regional dynamics and contributions to global peacekeeping efforts. Understanding the involvement of neighboring nations, such as Ethiopia, Somalia, South Sudan, Tanzania, and Uganda, allows for a comprehensive evaluation of the collective regional impact on UN missions.
sorted_df = kenya_about_world.sort_values(by="number_of_contributions",ascending=False)
sorted_df.head(3)
| number_of_contributions | sum_female_personnel | sum_male_personnel | total_personnel | percent_female_personnel | percent_male_personnel | |
|---|---|---|---|---|---|---|
| contributing_country | ||||||
| Nepal | 4501 | 34739 | 781767 | 816506 | 4.25 | 95.75 |
| Bangladesh | 4020 | 40379 | 1204755 | 1245134 | 3.24 | 96.76 |
| Ghana | 3818 | 53534 | 392135 | 445669 | 12.01 | 87.99 |
sorted_df = kenya_about_world.sort_values(by="total_personnel",ascending=False)
sorted_df.head(3)
| number_of_contributions | sum_female_personnel | sum_male_personnel | total_personnel | percent_female_personnel | percent_male_personnel | |
|---|---|---|---|---|---|---|
| contributing_country | ||||||
| Bangladesh | 4020 | 40379 | 1204755 | 1245134 | 3.24 | 96.76 |
| India | 2876 | 16714 | 1092456 | 1109170 | 1.51 | 98.49 |
| Pakistan | 2779 | 7578 | 1064238 | 1071816 | 0.71 | 99.29 |
Comparing Kenya's contributions to UN peace missions with top-contributing countries like Nepal, Bangladesh, and Ghana is crucial for benchmarking, understanding global dynamics, and optimizing resource allocation. This analysis helps identify best practices, assess the global impact, and strategically plan Kenya's future commitments in alignment with successful nations in UN peacekeeping efforts.
import plotly.graph_objects as go
import plotly.subplots as sp
# Total contributions
total_contribution = kenya_about_world['number_of_contributions'].sum()
# Kenya's contribution
kenya_contributions = kenya_about_world.loc["Kenya", "number_of_contributions"]
# Best contributors
nepal_contributions = kenya_about_world.loc["Nepal", "number_of_contributions"]
bangladesh_contribution = kenya_about_world.loc["Bangladesh", "number_of_contributions"]
ghana_contributions = kenya_about_world.loc["Ghana", "number_of_contributions"]
# Neighboring countries of Kenya
ethiopia_contributions = kenya_about_world.loc["Ethiopia", "number_of_contributions"]
somalia_contributions = 0
sudan_contributions = kenya_about_world.loc["Sudan", "number_of_contributions"]
tanzania_contributions = 0
uganda_contributions = kenya_about_world.loc["Uganda", "number_of_contributions"]
# Calculate contributions
best_contributions = nepal_contributions - bangladesh_contribution - ghana_contributions
neighboring_contributions = ethiopia_contributions - somalia_contributions - sudan_contributions - tanzania_contributions - uganda_contributions
other_countries = total_contribution - kenya_contributions - best_contributions - neighboring_contributions
# Data for the first pie chart
contributions = [kenya_contributions, nepal_contributions, bangladesh_contribution, ghana_contributions, ethiopia_contributions, sudan_contributions, uganda_contributions, other_countries]
labels = ["Kenya", "Nepal", "Bangladesh", "Ghana", "Ethiopia", "South Sudan", "Uganda", "Other"]
# Total personnel
total_personnel = kenya_about_world['total_personnel'].sum()
# Kenya's contribution
kenya_personnel = kenya_about_world.loc["Kenya", "total_personnel"]
# Best contributors
india_personnel = kenya_about_world.loc["India", "total_personnel"]
bangladesh_personnel = kenya_about_world.loc["Bangladesh", "total_personnel"]
pakistan_personnel = kenya_about_world.loc["Pakistan", "total_personnel"]
# Neighboring countries of Kenya
ethiopia_personnel = kenya_about_world.loc["Ethiopia", "total_personnel"]
somalia_personnel = 0
sudan_personnel = kenya_about_world.loc["Sudan", "total_personnel"]
tanzania_personnel = 0
uganda_personnel = kenya_about_world.loc["Uganda", "total_personnel"]
best_personnel = india_personnel - bangladesh_personnel - pakistan_personnel
neighboring_personnel = ethiopia_personnel - somalia_personnel - sudan_personnel - tanzania_personnel - uganda_personnel
other_personnel = total_personnel - kenya_personnel - best_personnel - neighboring_personnel
personnel = [kenya_personnel, india_personnel, bangladesh_personnel, pakistan_personnel, ethiopia_personnel, sudan_personnel, uganda_personnel, other_personnel]
fig = sp.make_subplots(1, 2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=labels, values=contributions, name="Contributions by Country"),1, 1)
fig.add_trace(go.Pie(labels=labels, values=personnel, name="Personnel Contributions by Country"),1, 2)
fig.update_layout(title_text="Comparison of Contributions and Personnel Contributions by Country")
fig.show()
import plotly.express as px
kenya_data = kenya_about_world.loc["Kenya"]
rest_of_world_data = kenya_about_world.drop("Kenya")
sum_female = rest_of_world_data["sum_female_personnel"].sum()
sum_male = rest_of_world_data["sum_male_personnel"].sum()
sum_total = sum_female + sum_male
avg_percent_female = sum_female * 100 / sum_total
avg_percent_male = sum_male * 100 / sum_total
data = pd.DataFrame({
"Country": ["Kenya", "Kenya", "Rest of the World", "Rest of the World"],
"Gender": ["Female", "Male", "Female", "Male"],
"Percentage": [kenya_data["percent_female_personnel"], kenya_data["percent_male_personnel"], avg_percent_female, avg_percent_male]
})
fig = px.bar(data, x="Country", y="Percentage", color="Gender",
labels={"Percentage": "Percentage"},
title="Comparison of Percent Female and Male Personnel in Kenya and Rest of the World")
fig.show()
A correlation map is crucial for your study as it visually represents the strength and direction of relationships between different variables, allowing you to identify patterns and dependencies in the data. This insight is essential for uncovering trends, making informed decisions, and gaining a comprehensive understanding of the factors influencing contributions by uniformed personnel from Kenya to UN Missions.
import seaborn as sns
numeric_columns = encoded_df.select_dtypes(include=['int32', 'float64']).columns.difference(['m49_code','contribution_id','id'])
correlation_matrix = encoded_df[numeric_columns].corr()
plt.figure(figsize=(15, 10))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Map for Kenya DataFrame (excluding m49_code)')
plt.show()
The initial correlation map provides a comprehensive overview of relationships within the dataset, where numerous correlations are observed. To focus our analysis, we will delve into specific connections, narrowing our attention to the columns 'female_personnel,' 'male_personnel,' and 'year' for a more detailed examination.
# Select the rows and columns you want to display
selected_rows = ['female_personnel', 'male_personnel','year']
numeric_columns = encoded_df.select_dtypes(include=['int32', 'float64']).columns.difference(['m49_code', 'contribution_id', 'id'])
selected_columns = list(set(selected_rows).union(numeric_columns))
correlation_matrix = encoded_df[selected_columns].corr()
plt.figure(figsize=(15, 3))
sns.heatmap(correlation_matrix.loc[selected_rows, selected_columns], annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Map for Kenya DataFrame (selected rows and columns)')
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.show()
In the next analysis, we are going to strategically filtered the correlation map to exclusively display relationships with a correlation coefficient greater than or equal to 0.25. By applying this threshold, we will aim to highlight and emphasize significant correlations while disregarding weaker associations, providing a clearer and more focused representation of the underlying patterns in the data.
# Select the rows and columns you want to display
selected_rows = ['female_personnel', 'male_personnel', 'year']
numeric_columns = encoded_df.select_dtypes(include=['int32', 'float64']).columns.difference(['m49_code', 'contribution_id', 'id'])
selected_columns = list(set(selected_rows).union(numeric_columns))
correlation_matrix = encoded_df[selected_columns].corr()
# Filter the correlation matrix based on the threshold
threshold = 0.25
significant_correlations = correlation_matrix[((correlation_matrix > threshold) | (correlation_matrix < -threshold)) & (correlation_matrix != 1)]
# Remove columns with no significant correlation
significant_columns = significant_correlations.columns[~significant_correlations.isna().all()]
plt.figure(figsize=(15, 3))
sns.heatmap(significant_correlations.loc[selected_rows, significant_columns], annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5, yticklabels=True)
plt.title('Significant Correlation Map for Kenya DataFrame (selected rows and columns)')
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.show()
Correlation-Based Hypotheses for Kenya's Contributions to UN Missions:
Gender Composition in Troops:
Mission Types (UNMISS):
Total Personnel Impact:
Troops and Male Personnel:
Temporal Trends:
Gender Percentages:
Mission Types and Male Personnel (UNMIS):
These hypotheses provide a preliminary understanding for further validation through statistical tests, visualizations, and comprehensive qualitative analysis. Combining this with domain knowledge will enhance our insights into Kenya's contributions to UN Missions.
dataframe_kenya[['female_personnel','male_personnel','total_personnel']].describe()
| female_personnel | male_personnel | total_personnel | |
|---|---|---|---|
| count | 1939.00000 | 1939.000000 | 1939.000000 |
| mean | 6.72099 | 41.211965 | 47.932955 |
| std | 22.14494 | 133.077989 | 152.839133 |
| min | 0.00000 | 0.000000 | 0.000000 |
| 25% | 0.00000 | 2.000000 | 3.000000 |
| 50% | 1.00000 | 6.000000 | 7.000000 |
| 75% | 4.00000 | 11.000000 | 14.000000 |
| max | 204.00000 | 829.000000 | 1027.000000 |
Summary Statistics for 'female_personnel' and 'male_personnel':
Count: Both 'female_personnel' and 'male_personnel' have a count of 146,967, indicating no missing values.
Mean: The mean for 'female_personnel' is approximately 4.86, while 'male_personnel' is around 92.28. On average, there are fewer female personnel compared to male personnel.
Standard Deviation (std): Both columns exhibit relatively high standard deviations, signifying significant variability in personnel numbers.
Minimum (min) and Maximum (max): The range spans from 0 to 720 for female personnel and 0 to 4243 for male personnel, suggesting a wide range of values and potential outliers.
Percentiles (25%, 50%, 75%): The 75th percentile for female personnel is 2, indicating that 75% of observations have a count of 2 or fewer female personnel.
In summary, these descriptive statistics offer a snapshot of the distribution and central tendency of 'female_personnel' and 'male_personnel'. Further analysis, such as data visualization or hypothesis testing, may be necessary for a deeper understanding of the data distribution and relationships.
dataframe_kenya.describe()
| id | contribution_id | female_personnel | male_personnel | m49_code | year | total_personnel | female_percent | male_percent | |
|---|---|---|---|---|---|---|---|---|---|
| count | 1939.000000 | 1939.000000 | 1939.00000 | 1939.000000 | 1939.0 | 1939.000000 | 1939.000000 | 1932.000000 | 1932.000000 |
| mean | 81922.358948 | 308980.740072 | 6.72099 | 41.211965 | 404.0 | 2017.521403 | 47.932955 | 21.095880 | 78.904120 |
| std | 41144.041391 | 185362.987396 | 22.14494 | 133.077989 | 0.0 | 3.932339 | 152.839133 | 21.908605 | 21.908605 |
| min | 825.000000 | 32180.000000 | 0.00000 | 0.000000 | 404.0 | 2010.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 48693.500000 | 73060.500000 | 0.00000 | 2.000000 | 404.0 | 2014.000000 | 3.000000 | 0.000000 | 69.230000 |
| 50% | 84164.000000 | 425382.000000 | 1.00000 | 6.000000 | 404.0 | 2018.000000 | 7.000000 | 16.670000 | 83.330000 |
| 75% | 118324.500000 | 468362.500000 | 4.00000 | 11.000000 | 404.0 | 2021.000000 | 14.000000 | 30.770000 | 100.000000 |
| max | 146802.000000 | 505177.000000 | 204.00000 | 829.000000 | 404.0 | 2023.000000 | 1027.000000 | 100.000000 | 100.000000 |
female_sum = dataframe_kenya["female_personnel"].sum()
male_sum = dataframe_kenya["male_personnel"].sum()
total_sum = female_sum + male_sum
female_percent = female_sum*100/total_sum
male_percent = male_sum*100/total_sum
print(f"Female Percent: {female_percent:.2f}%, Male Percent: {male_percent:.2f}%")
Female Percent: 14.02%, Male Percent: 85.98%
kenya_by_year = dataframe_kenya.groupby('year')[['female_personnel','male_personnel']].sum()
kenya_by_year['total_personnel'] = kenya_by_year['female_personnel'] + kenya_by_year['male_personnel']
kenya_by_year['female_percent'] = round(kenya_by_year['female_personnel'] * 100 / kenya_by_year['total_personnel'],2)
kenya_by_year['male_percent'] = round(kenya_by_year['male_personnel'] * 100 / kenya_by_year['total_personnel'],2)
kenya_by_year['number_of_mission'] = dataframe_kenya.groupby('year').size()
kenya_by_year
| female_personnel | male_personnel | total_personnel | female_percent | male_percent | number_of_mission | |
|---|---|---|---|---|---|---|
| year | ||||||
| 2010 | 717 | 9807 | 10524 | 6.81 | 93.19 | 94 |
| 2011 | 822 | 9345 | 10167 | 8.08 | 91.92 | 82 |
| 2012 | 826 | 9296 | 10122 | 8.16 | 91.84 | 97 |
| 2013 | 1207 | 8370 | 9577 | 12.60 | 87.40 | 109 |
| 2014 | 1651 | 8264 | 9915 | 16.65 | 83.35 | 126 |
| 2015 | 2046 | 8723 | 10769 | 19.00 | 81.00 | 136 |
| 2016 | 2416 | 10230 | 12646 | 19.10 | 80.90 | 146 |
| 2017 | 226 | 1126 | 1352 | 16.72 | 83.28 | 62 |
| 2018 | 335 | 1871 | 2206 | 15.19 | 84.81 | 172 |
| 2019 | 501 | 1457 | 1958 | 25.59 | 74.41 | 164 |
| 2020 | 498 | 1433 | 1931 | 25.79 | 74.21 | 186 |
| 2021 | 423 | 2232 | 2655 | 15.93 | 84.07 | 186 |
| 2022 | 617 | 3698 | 4315 | 14.30 | 85.70 | 198 |
| 2023 | 747 | 4058 | 4805 | 15.55 | 84.45 | 181 |
import plotly.graph_objects as go
# Line Plot: Total Personnel Over Years
fig1 = go.Figure()
fig1.add_trace(go.Scatter(
x=kenya_by_year.index,
y=kenya_by_year['total_personnel'],
mode='lines+markers',
name='Total Personnel',
line=dict(color='blue')
))
fig1.update_layout(
title='Total Personnel from Kenya in UN Missions Over Years',
xaxis_title='Year',
yaxis_title='Total Personnel',
legend=dict(title='Personnel Type'),
showlegend=True
)
# Stacked Bar Plot: Gender Distribution Over Years
fig2 = go.Figure()
fig2.add_trace(go.Bar(
x=kenya_by_year.index,
y=kenya_by_year['female_personnel'],
name='Female Personnel',
marker=dict(color='purple')
))
fig2.add_trace(go.Bar(
x=kenya_by_year.index,
y=kenya_by_year['male_personnel'],
name='Male Personnel',
marker=dict(color='orange'),
base=kenya_by_year['female_personnel']
))
fig2.update_layout(
title='Gender Distribution of Kenya\'s Contributions in UN Missions Over Years',
xaxis_title='Year',
yaxis_title='Personnel Count',
barmode='stack',
legend=dict(title='Gender'),
showlegend=True
)
# Bar Plot: Number of Missions Over Years
fig3 = go.Figure()
fig3.add_trace(go.Bar(
x=kenya_by_year.index,
y=kenya_by_year['number_of_mission'],
marker=dict(color='green')
))
fig3.update_layout(
title='Number of Missions Contributed by Kenya Over Years',
xaxis_title='Year',
yaxis_title='Number of Missions',
showlegend=False
)
fig1.show()
fig2.show()
fig3.show()
kenya_by_mission = dataframe_kenya.groupby('mission_acronym')[['female_personnel','male_personnel']].sum()
kenya_by_mission['total_personnel'] = kenya_by_mission['female_personnel'] + kenya_by_mission['male_personnel']
kenya_by_mission['female_percent'] = round(kenya_by_mission['female_personnel'] * 100 / kenya_by_mission['total_personnel'],2)
kenya_by_mission['male_percent'] = round(kenya_by_mission['male_personnel'] * 100 / kenya_by_mission['total_personnel'],2)
kenya_by_mission['nb_mission']=dataframe_kenya["mission_acronym"].value_counts()
kenya_by_mission
| female_personnel | male_personnel | total_personnel | female_percent | male_percent | nb_mission | |
|---|---|---|---|---|---|---|
| mission_acronym | ||||||
| MINURCAT | 0 | 37 | 37 | 0.00 | 100.00 | 11 |
| MINURSO | 0 | 28 | 28 | 0.00 | 100.00 | 28 |
| MINUSCA | 362 | 1056 | 1418 | 25.53 | 74.47 | 196 |
| MINUSMA | 159 | 755 | 914 | 17.40 | 82.60 | 164 |
| MONUC | 10 | 126 | 136 | 7.35 | 92.65 | 6 |
| MONUSCO | 1347 | 9915 | 11262 | 11.96 | 88.04 | 337 |
| UNAMID | 1778 | 10359 | 12137 | 14.65 | 85.35 | 305 |
| UNIFIL | 64 | 174 | 238 | 26.89 | 73.11 | 127 |
| UNISFA | 49 | 109 | 158 | 31.01 | 68.99 | 60 |
| UNMHA | 13 | 15 | 28 | 46.43 | 53.57 | 29 |
| UNMIL | 319 | 1264 | 1583 | 20.15 | 79.85 | 112 |
| UNMIS | 967 | 13186 | 14153 | 6.83 | 93.17 | 54 |
| UNMISS | 7916 | 42794 | 50710 | 15.61 | 84.39 | 401 |
| UNSMIS | 0 | 7 | 7 | 0.00 | 100.00 | 3 |
| UNSOM | 48 | 20 | 68 | 70.59 | 29.41 | 41 |
| UNSOS | 0 | 65 | 65 | 0.00 | 100.00 | 65 |
unwanted_columns = ['Total of mission', 'female_percent', 'male_percent']
kenya_by_mission_subset = kenya_by_mission.drop(unwanted_columns, axis=1, errors='ignore')
kenya_by_mission_subset = kenya_by_mission_subset.select_dtypes(include='number')
sorted_data = kenya_by_mission_subset.sort_values(by='total_personnel', ascending=False)
fig = px.bar(sorted_data, x=sorted_data.index, y=['female_personnel', 'male_personnel'],
labels={'value': 'Personnel Count', 'variable': 'Gender'},
title='Gender Distribution by Mission in Kenya',
color_discrete_map={'female_personnel': 'purple', 'male_personnel': 'orange'},
width=1000, height=600)
fig.update_layout(barmode='stack', legend=dict(title='Gender', orientation='h', x=0, y=1.1),
xaxis_title='Mission Acronym', yaxis_title='Personnel Count')
fig.show()
kenya_by_personnel_type = dataframe_kenya.groupby('personnel_type')[['female_personnel','male_personnel']].sum()
kenya_by_personnel_type['total_personnel'] = kenya_by_personnel_type['female_personnel'] + kenya_by_personnel_type['male_personnel']
kenya_by_personnel_type['female_percent'] = round(kenya_by_personnel_type['female_personnel'] * 100 / kenya_by_personnel_type['total_personnel'],2)
kenya_by_personnel_type['male_percent'] = round(kenya_by_personnel_type['male_personnel'] * 100 / kenya_by_personnel_type['total_personnel'],2)
kenya_by_personnel_type['nb_mission']=dataframe_kenya["personnel_type"].value_counts()
kenya_by_personnel_type
| female_personnel | male_personnel | total_personnel | female_percent | male_percent | nb_mission | |
|---|---|---|---|---|---|---|
| personnel_type | ||||||
| Experts on Mission | 731 | 3441 | 4172 | 17.52 | 82.48 | 696 |
| Individual Police | 1304 | 3615 | 4919 | 26.51 | 73.49 | 398 |
| Staff Officer | 801 | 2416 | 3217 | 24.90 | 75.10 | 406 |
| Troops | 10196 | 70438 | 80634 | 12.64 | 87.36 | 439 |
# Data
labels = kenya_by_personnel_type.index
missions_count = kenya_by_personnel_type['nb_mission']
bar_positions = list(range(len(labels)))
fig = go.Figure()
# Plot Number of Missions
fig.add_trace(go.Bar(
x=bar_positions,
y=missions_count,
marker_color='green'
))
fig.update_layout(
xaxis=dict(tickmode='array', tickvals=bar_positions, ticktext=labels),
xaxis_title='Personnel Type',
yaxis_title='Number of Missions',
title='Number of Missions in Kenya by Personnel Type'
)
for i, mission_count in enumerate(missions_count):
fig.add_annotation(
x=bar_positions[i],
y=mission_count + 1,
text=str(mission_count),
showarrow=False,
font=dict(color='black')
)
fig.show()
import plotly.graph_objects as go
# Data
labels = kenya_by_personnel_type.index
female_percentages = kenya_by_personnel_type['female_percent']
male_percentages = kenya_by_personnel_type['male_percent']
bar_positions = list(range(len(labels)))
fig = go.Figure()
# Plot Female Percentages
fig.add_trace(go.Bar(
x=bar_positions,
y=female_percentages,
name='Female',
marker_color='purple'
))
# Plot Male Percentages
fig.add_trace(go.Bar(
x=bar_positions,
y=male_percentages,
name='Male',
marker_color='orange',
# Offset the male bars
offsetgroup=1
))
fig.update_layout(
barmode='stack',
xaxis=dict(tickmode='array', tickvals=bar_positions, ticktext=labels),
xaxis_title='Personnel Type',
yaxis_title='Percentage',
title='Distribution of Personnel in Kenya by Personnel Type'
)
for i, (female_percentage, male_percentage) in enumerate(zip(female_percentages, male_percentages)):
fig.add_annotation(
x=bar_positions[i],
y=female_percentage + male_percentage + 1,
text=f'{female_percentage + male_percentage:.2f}%',
showarrow=False,
font=dict(color='black')
)
# Show figure
fig.show()
dataframe_coordinates = pd.read_csv("DPO-UCHISTORICAL_coordinates2.csv",encoding = 'utf-8',sep=';')
dataframe_coordinates.dtypes
dataframe_coordinates["Value of GDP (million)"] = dataframe_coordinates["Value of GDP (million)"].astype(float)
dataframe_coordinates["Value of population (million)"] = dataframe_coordinates["Value of population (million)"].astype(float)
dataframe_coordinates_world = pd.DataFrame()
dataframe_coordinates_world['number_of_contributions'] = dataframe_coordinates.groupby('contributing_country')['contributing_country'].count()
dataframe_coordinates_world["Value of GDP (million)"] = dataframe_coordinates.groupby('contributing_country')["Value of GDP (million)"].mean()
dataframe_coordinates_world["Value of population (million)"] = dataframe_coordinates.groupby('contributing_country')["Value of population (million)"].mean()
dataframe_coordinates_world["Latitude"] = dataframe_coordinates.groupby('contributing_country')["Latitude"].mean()
dataframe_coordinates_world["Longitude"] = dataframe_coordinates.groupby('contributing_country')["Longitude"].mean()
dataframe_coordinates_world['sum_female_personnel'] = dataframe_coordinates.groupby('contributing_country')['female_personnel'].sum()
dataframe_coordinates_world['sum_male_personnel'] = dataframe_coordinates.groupby('contributing_country')['male_personnel'].sum()
dataframe_coordinates_world['total_personnel'] = dataframe_coordinates_world['sum_female_personnel']+dataframe_coordinates_world['sum_male_personnel']
dataframe_coordinates_world['percent_female_personnel'] = round(dataframe_coordinates_world['sum_female_personnel'] * 100 / dataframe_coordinates_world['total_personnel'],2)
dataframe_coordinates_world['percent_male_personnel'] = round(dataframe_coordinates_world['sum_male_personnel'] * 100 / dataframe_coordinates_world['total_personnel'],2)
dataframe_coordinates_world.isna().sum()
dataframe_coordinates_world["percent_female_personnel"].fillna(0,inplace=True)
dataframe_coordinates_world["percent_male_personnel"].fillna(0,inplace=True)
dataframe_coordinates_world.reset_index(inplace=True)
dataframe_coordinates_world.columns
Index(['contributing_country', 'number_of_contributions',
'Value of GDP (million)', 'Value of population (million)', 'Latitude',
'Longitude', 'sum_female_personnel', 'sum_male_personnel',
'total_personnel', 'percent_female_personnel',
'percent_male_personnel'],
dtype='object')
dataframe_coordinates_world
| contributing_country | number_of_contributions | Value of GDP (million) | Value of population (million) | Latitude | Longitude | sum_female_personnel | sum_male_personnel | total_personnel | percent_female_personnel | percent_male_personnel | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Afghanistan | 1 | 14.939 | 41.13 | 33.939110 | 67.709953 | 0.0 | 0.0 | 0.0 | 0.00 | 0.00 |
| 1 | Albania | 85 | 18.260 | 2.84 | 41.153332 | 20.168331 | 72.0 | 481.0 | 553.0 | 13.02 | 86.98 |
| 2 | Algeria | 155 | 163.473 | 44.90 | 28.033886 | 1.659626 | 0.0 | 456.0 | 456.0 | 0.00 | 100.00 |
| 3 | Angola | 17 | 70.533 | 35.59 | -11.202692 | 17.873887 | 17.0 | 17.0 | 34.0 | 50.00 | 50.00 |
| 4 | Argentina | 1290 | 487.227 | 45.51 | -38.416097 | -63.616672 | 6630.0 | 81531.0 | 88161.0 | 7.52 | 92.48 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 148 | Vanuatu | 78 | 981.000 | 0.33 | -15.376706 | 166.959158 | 47.0 | 444.0 | 491.0 | 9.57 | 90.43 |
| 149 | Viet Nam | 474 | 366.138 | 98.19 | 14.058324 | 108.277199 | 1203.0 | 7161.0 | 8364.0 | 14.38 | 85.62 |
| 150 | Yemen | 1489 | 9.947 | 33.70 | 15.552727 | 48.516388 | 0.0 | 23853.0 | 23853.0 | 0.00 | 100.00 |
| 151 | Zambia | 2024 | 21.313 | 20.02 | -13.133897 | 27.849332 | 15031.0 | 100111.0 | 115142.0 | 13.05 | 86.95 |
| 152 | Zimbabwe | 1509 | 24.118 | 16.32 | -19.015438 | 29.154857 | 5398.0 | 8224.0 | 13622.0 | 39.63 | 60.37 |
153 rows × 11 columns
fig = px.scatter_geo(
dataframe_coordinates_world,
lat="Latitude",
lon="Longitude",
size="total_personnel",
hover_name="contributing_country",
projection="natural earth",
title="Contributions by Country (personnel)",
template="plotly",
color="total_personnel",
color_continuous_scale="Viridis",
)
fig.update_layout(
geo=dict(showland=True),
margin=dict(l=0, r=0, t=40, b=0),
)
fig.show()
fig = px.scatter_geo(
dataframe_coordinates_world,
lat="Latitude",
lon="Longitude",
size="Value of GDP (million)",
hover_name="contributing_country",
projection="natural earth",
title="Value of GDP (million)",
template="plotly",
color="Value of GDP (million)",
color_continuous_scale="Viridis",
)
fig.update_layout(
geo=dict(showland=True),
margin=dict(l=0, r=0, t=40, b=0),
)
fig.show()
fig = px.scatter_geo(
dataframe_coordinates_world,
lat="Latitude",
lon="Longitude",
size="Value of population (million)",
hover_name="contributing_country",
projection="natural earth",
title="Number of inhabitants",
template="plotly",
color="Value of population (million)",
color_continuous_scale="Viridis",
)
fig.update_layout(
geo=dict(showland=True),
margin=dict(l=0, r=0, t=40, b=0),
)
fig.show()
Kenya's role in UN peace missions stands out as a beacon of commitment and influence across various dimensions.
Active Engagement:
Yearly Impact:
Gender Parity:
Troop Contribution:
In Closing:
Annex:
Additional Resources: